Stochastic Shortest Path Problems Under Weak Conditions
نویسندگان
چکیده
In this paper we weaken the conditions under which some of the basic analytical and algorithmic results for finite-state stochastic shortest path problems hold. We provide an analysis under three types of assumptions, under all of which the standard form of policy iteration may fail, and other anomalies may occur. In the first type of assumptions, we require a standard compactness and continuity condition, as well as the existence of an optimal proper policy, thereby allowing positive and negative costs per stage, and improper policies with finite cost at all states. The analysis is based on introducing an additive perturbation δ > 0 to the cost per stage, which drives the cost of improper policies to infinity. By considering the δ-perturbed problem and taking the limit as δ ↓ 0, we show the validity of Bellman’s equation and value iteration, and we construct a convergent policy iteration algorithm that uses a diminishing sequence of perturbations. In the second type of assumptions we require nonpositive one-stage costs and we give policy iteration algorithms that are optimistic and do not require the use of perturbations. In the third type of assumptions we require nonnegative one-stage costs, as well as the compactness and continuity condition, and we convert the problem to an equivalent stochastic shortest path problem for which the existing theory applies. Using this transformation, we address the uniqueness of solution of Bellman’s equation, the convergence of value iteration, and the convergence of some variants of policy iteration. Our analysis and algorithms under the second and third type of assumptions fully apply to finite-state positive (reward) and negative (reward) dynamic programming models. † Dimitri Bertsekas is with the Dept. of Electr. Engineering and Comp. Science, and the Laboratory for Information and Decision Systems, M.I.T., Cambridge, Mass., 02139. His research was supported by the Air Force Grant FA9550-10-1-0412. ‡ Huizhen Yu is with the Laboratory for Information and Decision Systems, M.I.T., Cambridge, Mass., 02139. Her research was supported by the Air Force Grant FA9550-10-1-0412. 1
منابع مشابه
ar X iv : 0 70 7 . 03 35 v 1 [ m at h . O C ] 3 J ul 2 00 7 Label - setting methods for Multimode Stochastic Shortest Path problems on graphs
Stochastic shortest path (SSP) problems arise in a variety of discrete stochastic control contexts. An optimal solutions to such a problem is typically computed using the value function, which can be found by solving the corresponding dynamic programming equations. In the deterministic case, these equations can be often solved by the highly efficient label-setting methods (such as Dijkstra’s an...
متن کاملLabel-Setting Methods for Multimode Stochastic Shortest Path Problems on Graphs
Stochastic shortest path (SSP) problems arise in a variety of discrete stochastic control contexts. An optimal solution to such a problem is typically computed using the value function, which can be found by solving the corresponding dynamic programming equations. In the deterministic case, these equations can be often solved by highly efficient label-setting methods (such as Dijkstra’s and Dia...
متن کاملar X iv : 0 70 7 . 03 35 v 2 [ m at h . O C ] 2 9 Fe b 20 08 Label - setting methods for Multimode Stochastic Shortest Path problems on graphs
Stochastic shortest path (SSP) problems arise in a variety of discrete stochastic control contexts. An optimal solution to such a problem is typically computed using the value function, which can be found by solving the corresponding dynamic programming equations. In the deterministic case, these equations can be often solved by the highly efficient label-setting methods (such as Dijkstra’s and...
متن کاملInfinite-Space Shortest Path Problems and Semicontractive Dynamic Programming†
In this paper we consider deterministic and stochastic shortest path problems with an infinite, possibly uncountable, number of states. The objective is to reach or approach a special destination state through a minimum cost path. We use an optimal control problem formulation, under assumptions that parallel those for finite-node shortest path problems, i.e., there exists a path to the destinat...
متن کاملSolving Stochastic Shortest-Path Problems with RTDP
We present a modification of the Real-Time Dynamic Programming (rtdp) algorithm that makes it a genuine off-line algorithm for solving Stochastic Shortest-Path problems. Also, a new domainindependent and admissible heuristic is presented for Stochastic Shortest-Path problems. The new algorithm and heuristic are compared with Value Iteration over benchmark problems with large state spaces. The r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013